Search CORE

68 research outputs found

Saliency-guided Adaptive Seeding for Supervoxel Segmentation

Author: Frintrop Simone
Gao Ge
Lauri Mikko
Zhang Jianwei
Publication venue
Publication date: 19/10/2017
Field of study

We propose a new saliency-guided method for generating supervoxels in 3D space. Rather than using an evenly distributed spatial seeding procedure, our method uses visual saliency to guide the process of supervoxel generation. This results in densely distributed, small, and precise supervoxels in salient regions which often contain objects, and larger supervoxels in less salient regions that often correspond to background. Our approach largely improves the quality of the resulting supervoxel segmentation in terms of boundary recall and under-segmentation error on publicly available benchmarks.Comment: 6 pages, accepted to IROS201

arXiv.org e-Print Archive

Crossref

Small, but important: Traffic light proposals for detecting small traffic lights and beyond

Author: Frintrop Simone
Sanitz Tom
Wilms Christian
Publication venue
Publication date: 27/07/2023
Field of study

Traffic light detection is a challenging problem in the context of self-driving cars and driver assistance systems. While most existing systems produce good results on large traffic lights, detecting small and tiny ones is often overlooked. A key problem here is the inherent downsampling in CNNs, leading to low-resolution features for detection. To mitigate this problem, we propose a new traffic light detection system, comprising a novel traffic light proposal generator that utilizes findings from general object proposal generation, fine-grained multi-scale features, and attention for efficient processing. Moreover, we design a new detection head for classifying and refining our proposals. We evaluate our system on three challenging, publicly available datasets and compare it against six methods. The results show substantial improvements of at least

12.6\%

on small and tiny traffic lights, as well as strong results across all sizes of traffic lights.Comment: Accepted at ICVS 202

arXiv.org e-Print Archive

Audio-Visual Speech Enhancement with Score-Based Generative Models

Author: Frintrop Simone
Gerkmann Timo
Richter Julius
Publication venue
Publication date: 02/06/2023
Field of study

This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information. In particular, we exploit audio-visual embeddings obtained from a self-super\-vised learning model that has been fine-tuned on lipreading. The layer-wise features of its transformer-based encoder are aggregated, time-aligned, and incorporated into the noise conditional score network. Experimental evaluations show that the proposed audio-visual speech enhancement system yields improved speech quality and reduces generative artifacts such as phonetic confusions with respect to the audio-only equivalent. The latter is supported by the word error rate of a downstream automatic speech recognition model, which decreases noticeably, especially at low input signal-to-noise ratios.Comment: Submitted to ITG Conference on Speech Communicatio

arXiv.org e-Print Archive